{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Credits\n",
    "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016/blob/master/lab2/lab2_CNN.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dependancies and supporting functions\n",
    "Loading dependancies and supporting functions by running the code block below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import sklearn.datasets\n",
    "import tensorflow as tf\n",
    "from tensorflow.python.framework.ops import reset_default_graph\n",
    "\n",
    "def onehot(t, num_classes):\n",
    "    out = np.zeros((t.shape[0], num_classes))\n",
    "    for row, col in enumerate(t):\n",
    "        out[row, col] = 1\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convolutional Neural networks 101\n",
    "\n",
    "Convolution neural networks are one of the most succesfull types of neural networks for image recognition and an integral part of reigniting the interest in neural networks. \n",
    "\n",
    "In this lab we'll experiment with inserting 2D-convolution layers in the fully connected neural networks introduced in LAB1. We'll furhter experiment with stacking of convolution layers, max pooling and strided convolutions which are all important techniques in current convolution neural network architectures. Lastly we'll try to visualize the learned convolution filters and try to understand what kind of features they learn to recognize.\n",
    "\n",
    "\n",
    "If you are unfamilar with the the convolution operation  https://github.com/vdumoulin/conv_arithmetic have a nice visualization of different convolution variants. For a more indept tutorial please see http://cs231n.github.io/convolutional-networks/ or http://neuralnetworksanddeeplearning.com/chap6.html. Lastly if you are ambitious and want implement a convolution neural network from scratch please see an exercise for our Deep Learning summer school last year https://github.com/DTU-deeplearning/day2-Conv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#LOAD the mnist data. To speed up training we'll only work on a subset of the data.\n",
    "#Note that we reshape the data from (nsamples, num_features)= (nsamples, nchannels*rows*cols)  -> (nsamples, nchannels, rows, cols)\n",
    "# in order to retain the spatial arrangements of the pixels\n",
    "data = np.load('mnist.npz')\n",
    "num_classes = 10\n",
    "nchannels,rows,cols = 1,28,28\n",
    "x_train = data['X_train'][:1000].astype('float32')\n",
    "x_train = x_train.reshape((-1,nchannels,rows,cols))\n",
    "targets_train = data['y_train'][:1000].astype('int32')\n",
    "\n",
    "x_valid = data['X_valid'][:500].astype('float32')\n",
    "x_valid = x_valid.reshape((-1,nchannels,rows,cols))\n",
    "targets_valid = data['y_valid'][:500].astype('int32')\n",
    "\n",
    "x_test = data['X_test'][:500].astype('float32')\n",
    "x_test = x_test.reshape((-1,nchannels,rows,cols))\n",
    "targets_test = data['y_test'][:500].astype('int32')\n",
    "\n",
    "print \"Information on dataset\"\n",
    "print \"x_train\", x_train.shape\n",
    "print \"targets_train\", targets_train.shape\n",
    "print \"x_valid\", x_valid.shape\n",
    "print \"targets_valid\", targets_valid.shape\n",
    "print \"x_test\", x_test.shape\n",
    "print \"targets_test\", targets_test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#plot a few MNIST examples\n",
    "idx = 0\n",
    "canvas = np.zeros((28*10, 10*28))\n",
    "for i in range(10):\n",
    "    for j in range(10):\n",
    "        canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n",
    "        idx += 1\n",
    "plt.figure(figsize=(7, 7))\n",
    "plt.imshow(canvas, cmap='gray')\n",
    "plt.title('MNIST handwritten digits')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Documentation on contrib layers\n",
    "Check out the [github page](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py) for information on contrib layers (not well documented in their api)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from tensorflow.contrib.layers import fully_connected, convolution2d, flatten, batch_norm, max_pool2d, dropout\n",
    "from tensorflow.python.ops.nn import relu, elu, relu6, sigmoid, tanh, softmax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# define a simple feed forward neural network\n",
    "\n",
    "# hyperameters of the model\n",
    "num_classes = 10\n",
    "channels = x_train.shape[1]\n",
    "height = x_train.shape[2]\n",
    "width = x_train.shape[3]\n",
    "num_filters_conv1 = 16\n",
    "kernel_size_conv1 = [5, 5] # [height, width]\n",
    "stride_conv1 = [1, 1] # [stride_height, stride_width]\n",
    "num_l1 = 100\n",
    "# resetting the graph ...\n",
    "reset_default_graph()\n",
    "\n",
    "# Setting up placeholder, this is where your data enters the graph!\n",
    "x_pl = tf.placeholder(tf.float32, [None, channels, height, width])\n",
    "l_reshape = tf.transpose(x_pl, [0, 2, 3, 1]) # TensorFlow uses NHWC instead of NCHW\n",
    "#is_training = tf.placeholder(tf.bool)#used for dropout\n",
    "\n",
    "# Building the layers of the neural network\n",
    "# we define the variable scope, so we more easily can recognise our variables later\n",
    "#l_conv1 = convolution2d(l_reshape, num_filters_conv1, kernel_size_conv1, stride_conv1, scope=\"l_conv1\")\n",
    "l_flatten = flatten(l_reshape, scope=\"flatten\") # use l_conv1 instead of l_reshape\n",
    "l1 = fully_connected(l_flatten, num_l1, activation_fn=relu, scope=\"l1\")\n",
    "#l1 = dropout(l1, is_training=is_training, scope=\"dropout\")\n",
    "y = fully_connected(l1, num_classes, activation_fn=softmax, scope=\"y\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# y_ is a placeholder variable taking on the value of the target batch.\n",
    "y_ = tf.placeholder(tf.float32, [None, num_classes])\n",
    "\n",
    "# computing cross entropy per sample\n",
    "cross_entropy = -tf.reduce_sum(y_ * tf.log(y+1e-8), reduction_indices=[1])\n",
    "\n",
    "# averaging over samples\n",
    "cross_entropy = tf.reduce_mean(cross_entropy)\n",
    "\n",
    "# defining our optimizer\n",
    "optimizer = tf.train.AdamOptimizer(learning_rate=0.001)\n",
    "\n",
    "# applying the gradients\n",
    "train_op = optimizer.minimize(cross_entropy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#Test the forward pass\n",
    "x = np.random.normal(0,1, (45, 1,28,28)).astype('float32') #dummy data\n",
    "\n",
    "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
    "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
    "# initialize the Session\n",
    "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
    "sess.run(tf.initialize_all_variables())\n",
    "res = sess.run(fetches=[y], feed_dict={x_pl: x})\n",
    "#res = sess.run(fetches=[y], feed_dict={x_pl: x, is_training: False}) # for when using dropout\n",
    "print \"y\", res[0].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#Training Loop\n",
    "from confusionmatrix import ConfusionMatrix\n",
    "batch_size = 100\n",
    "num_epochs = 10\n",
    "num_samples_train = x_train.shape[0]\n",
    "num_batches_train = num_samples_train // batch_size\n",
    "num_samples_valid = x_valid.shape[0]\n",
    "num_batches_valid = num_samples_valid // batch_size\n",
    "\n",
    "train_acc, train_loss = [], []\n",
    "valid_acc, valid_loss = [], []\n",
    "test_acc, test_loss = [], []\n",
    "cur_loss = 0\n",
    "loss = []\n",
    "try:\n",
    "    for epoch in range(num_epochs):\n",
    "        #Forward->Backprob->Update params\n",
    "        cur_loss = 0\n",
    "        for i in range(num_batches_train):\n",
    "            idx = range(i*batch_size, (i+1)*batch_size)\n",
    "            x_batch = x_train[idx]\n",
    "            target_batch = targets_train[idx]\n",
    "            feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes)}\n",
    "            #feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes), is_training: True}\n",
    "            fetches_train = [train_op, cross_entropy]\n",
    "            res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n",
    "            batch_loss = res[1] #this will do the complete backprob pass\n",
    "            cur_loss += batch_loss\n",
    "        loss += [cur_loss/batch_size]\n",
    "\n",
    "        confusion_valid = ConfusionMatrix(num_classes)\n",
    "        confusion_train = ConfusionMatrix(num_classes)\n",
    "\n",
    "        for i in range(num_batches_train):\n",
    "            idx = range(i*batch_size, (i+1)*batch_size)\n",
    "            x_batch = x_train[idx]\n",
    "            targets_batch = targets_train[idx]\n",
    "            # what to feed our accuracy op\n",
    "            feed_dict_eval_train = {x_pl: x_batch}\n",
    "            #feed_dict_eval_train = {x_pl: x_batch, is_training: False}\n",
    "            # deciding which parts to fetch\n",
    "            fetches_eval_train = [y]\n",
    "            # running the validation\n",
    "            res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
    "            # collecting and storing predictions\n",
    "            net_out = res[0] \n",
    "            preds = np.argmax(net_out, axis=-1) \n",
    "            confusion_train.batch_add(targets_batch, preds)\n",
    "\n",
    "        confusion_valid = ConfusionMatrix(num_classes)\n",
    "        for i in range(num_batches_valid):\n",
    "            idx = range(i*batch_size, (i+1)*batch_size)\n",
    "            x_batch = x_valid[idx]\n",
    "            targets_batch = targets_valid[idx]\n",
    "            # what to feed our accuracy op\n",
    "            feed_dict_eval_train = {x_pl: x_batch}\n",
    "            #feed_dict_eval_train = {x_pl: x_batch, is_training: False}\n",
    "            # deciding which parts to fetch\n",
    "            fetches_eval_train = [y]\n",
    "            # running the validation\n",
    "            res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
    "            # collecting and storing predictions\n",
    "            net_out = res[0]\n",
    "            preds = np.argmax(net_out, axis=-1) \n",
    "\n",
    "            confusion_valid.batch_add(targets_batch, preds)\n",
    "\n",
    "        train_acc_cur = confusion_train.accuracy()\n",
    "        valid_acc_cur = confusion_valid.accuracy()\n",
    "\n",
    "        train_acc += [train_acc_cur]\n",
    "        valid_acc += [valid_acc_cur]\n",
    "        print \"Epoch %i : Train Loss %e , Train acc %f,  Valid acc %f \" \\\n",
    "        % (epoch+1, loss[-1], train_acc_cur, valid_acc_cur)\n",
    "except KeyboardInterrupt:\n",
    "    pass\n",
    "    \n",
    "\n",
    "#get test set score\n",
    "confusion_test = ConfusionMatrix(num_classes)\n",
    "# what to feed our accuracy op\n",
    "feed_dict_eval_train = {x_pl: x_test}\n",
    "#feed_dict_eval_train = {x_pl: x_test, is_training: False}\n",
    "# deciding which parts to fetch\n",
    "fetches_eval_train = [y]\n",
    "# running the validation\n",
    "res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
    "# collecting and storing predictions\n",
    "net_out = res[0] \n",
    "preds = np.argmax(net_out, axis=-1) \n",
    "confusion_test.batch_add(targets_test, preds)\n",
    "print \"\\nTest set Acc:  %f\" %(confusion_test.accuracy())\n",
    "\n",
    "\n",
    "epoch = np.arange(len(train_acc))\n",
    "plt.figure()\n",
    "plt.plot(epoch,train_acc,'r',epoch,valid_acc,'b')\n",
    "plt.legend(['Train Acc','Val Acc'])\n",
    "plt.xlabel('Epochs'), plt.ylabel('Acc'), plt.ylim([0.75,1.03])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignments 1\n",
    "\n",
    " 1) Note the performance of the standard feedforward neural network. Add a 2D convolution layer before the dense hidden layer and confirm that it increases the generalization performance of the network (try num_filters=16 and filter_size=5 as a starting point). \n",
    " \n",
    " 2) Can the performance be increases even further by stacking more convolution layers ?\n",
    " \n",
    " 3) Maxpooling is a technique for decreasing the spatial resolution of an image while retaining the important features. Effectively this gives a local translational invariance and reduces the computation by a factor of four. In the classification algorithm which is usually desirable. Try to either: \n",
    " \n",
    "     a) add a maxpool layer(add arguement pool_size=2)  after the convolution layer or\n",
    "     b) set add stride=2 to the arguments of the convolution layer. \n",
    "  Verify that this decreases spatial dimension of the image. (print l_conv.output_shape or print   l_maxpool.output_shape). Does this increase the performance of the network (you may need to stack multiple layers or increase the number of filters to increase performance) ?\n",
    "  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualization of filters\n",
    "Convolution filters can be interpreted as spatial feature detectors picking up different image features such as edges, corners etc. Below we provide code for visualization of the filters. The best results are obtained with fairly large filters of size 9 and either 16 or 36 filters. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# to start with we print the names of the weights in our graph\n",
    "# to see what operations we are allowed to perform on the variables in our graph, try:\n",
    "#print(dir(tf.all_variables()[0]))\n",
    "# you will notice it has \"name\" and \"value\", which we will build a dictionary from\n",
    "names_and_vars = {var.name: sess.run(var.value()) for var in tf.all_variables()}\n",
    "print(names_and_vars.keys())\n",
    "# getting the name was easy, just use .name on the variable object\n",
    "# getting the value in a numpy array format is slightly more tricky\n",
    "# we need to first get a variable object, then turn it into a tensor with .value()\n",
    "# and the evaluate the tensor with sess.run(...)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "### ERROR - If you get a key error, then you need to define l_conv1 in your model!\n",
    "if not u'l_conv1/weights:0' in names_and_vars:\n",
    "    print \"You need to go back and define a convolutional layer in the network.\"\n",
    "else:\n",
    "    np_W = names_and_vars[u'l_conv1/weights:0'] # get the filter values from the first conv layer\n",
    "    print np_W.shape, \"i.e. the shape is filter_size, filter_size, num_channels, num_filters\"\n",
    "    filter_size, _, num_channels, num_filters = np_W.shape\n",
    "    n = int(num_filters**0.5)\n",
    "\n",
    "    # reshaping the last dimension to be n by n\n",
    "    np_W_res = np_W.reshape(filter_size, filter_size, num_channels, n, n)\n",
    "    fig, ax = plt.subplots(n,n)\n",
    "    print \"learned filter values\"\n",
    "    for i in range(n):\n",
    "        for j in range(n):\n",
    "            ax[i,j].imshow(np_W_res[:,:,0,i,j], cmap='gray',interpolation='none')\n",
    "            ax[i,j].xaxis.set_major_formatter(plt.NullFormatter())\n",
    "            ax[i,j].yaxis.set_major_formatter(plt.NullFormatter())\n",
    "\n",
    "\n",
    "    idx = 1\n",
    "    plt.figure()\n",
    "    plt.imshow(x_train[idx,0],cmap='gray',interpolation='none')\n",
    "    plt.title('Inut Image')\n",
    "    plt.show()\n",
    "\n",
    "    #visalize the filters convolved with an input image\n",
    "    from scipy.signal import convolve2d\n",
    "    np_W_res = np_W.reshape(filter_size, filter_size, num_channels, n, n)\n",
    "    fig, ax = plt.subplots(n,n,figsize=(9,9))\n",
    "    print \"Response from input image convolved with the filters\"\n",
    "    for i in range(n):\n",
    "        for j in range(n):\n",
    "            ax[i,j].imshow(convolve2d(x_train[1,0],np_W_res[:,:,0,i,j],mode='same'),\n",
    "                           cmap='gray',interpolation='none')\n",
    "            ax[i,j].xaxis.set_major_formatter(plt.NullFormatter())\n",
    "            ax[i,j].yaxis.set_major_formatter(plt.NullFormatter())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignment 2\n",
    "\n",
    "The visualized filters will likely look most like noise due to the small amount of training data.\n",
    "\n",
    " 1) Try to use 10000 traning examples instead and visualise the filters again\n",
    " \n",
    " 2) Dropout is a very usefull technique for preventing overfitting. Try to add a DropoutLayer after the convolution layer and hidden layer. This should increase both performance and the \"visual appeal\" of the filters\n",
    " \n",
    " 3) Batch normalization is a recent innovation for improving generalization performance. Try to insert batch normalization layers into the network to improve performance. \n",
    " \n",
    " \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# More Fun with convolutional networks\n",
    "### Get the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "!wget -N https://s3.amazonaws.com/lasagne/recipes/datasets/mnist_cluttered_60x60_6distortions.npz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the data the each mnist digit (20x20 pixels) has been placed randomly in a 60x60 canvas. To make the task harder each canvas has then been cluttered with small pieces of digits. In this task it is helpfull for a network if it can focus only on the digit and ignore the rest.\n",
    "\n",
    "The ``TransformerLayer`` lets us do this. The transformer layer learns an affine transformation which lets the network zoom, rotate and skew. If you are interested you should read the paper, but the main idea is that you can let a small convolutional network determine the the parameters of the affine transformation. You then apply the affine transformation to the input data. Usually this also involves downsampling which forces the model to zoom in on the relevant parts of the data. After the affine transformation we can use a larger conv net to do the classification. \n",
    "This is possible because you can backprop through a an affine transformation if you use bilinear interpolation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import matplotlib\n",
    "import numpy as np\n",
    "np.random.seed(123)\n",
    "import matplotlib.pyplot as plt\n",
    "import tensorflow as tf\n",
    "from tensorflow.contrib.layers import fully_connected, convolution2d, flatten, max_pool2d\n",
    "pool = max_pool2d\n",
    "conv = convolution2d\n",
    "dense = fully_connected\n",
    "from tensorflow.python.ops.nn import relu, softmax\n",
    "from tensorflow.python.framework.ops import reset_default_graph\n",
    "\n",
    "from spatial_transformer import transformer\n",
    "\n",
    "def onehot(t, num_classes):\n",
    "    out = np.zeros((t.shape[0], num_classes))\n",
    "    for row, col in enumerate(t):\n",
    "        out[row, col] = 1\n",
    "    return out\n",
    "\n",
    "NUM_EPOCHS = 500\n",
    "BATCH_SIZE = 256\n",
    "LEARNING_RATE = 0.001\n",
    "DIM = 60\n",
    "NUM_CLASSES = 10\n",
    "mnist_cluttered = \"mnist_cluttered_60x60_6distortions.npz\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def load_data():\n",
    "    data = np.load(mnist_cluttered)\n",
    "    X_train, y_train = data['x_train'], np.argmax(data['y_train'], axis=-1)\n",
    "    X_valid, y_valid = data['x_valid'], np.argmax(data['y_valid'], axis=-1)\n",
    "    X_test, y_test = data['x_test'], np.argmax(data['y_test'], axis=-1)\n",
    "\n",
    "    # reshape for convolutions\n",
    "    X_train = X_train.reshape((X_train.shape[0], 1, DIM, DIM))\n",
    "    X_valid = X_valid.reshape((X_valid.shape[0], 1, DIM, DIM))\n",
    "    X_test = X_test.reshape((X_test.shape[0], 1, DIM, DIM))\n",
    "    \n",
    "    print \"Train samples:\", X_train.shape\n",
    "    print \"Validation samples:\", X_valid.shape\n",
    "    print \"Test samples:\", X_test.shape\n",
    "\n",
    "    return dict(\n",
    "        X_train=np.asarray(X_train, dtype='float32'),\n",
    "        y_train=y_train.astype('int32'),\n",
    "        X_valid=np.asarray(X_valid, dtype='float32'),\n",
    "        y_valid=y_valid.astype('int32'),\n",
    "        X_test=np.asarray(X_test, dtype='float32'),\n",
    "        y_test=y_test.astype('int32'),\n",
    "        num_examples_train=X_train.shape[0],\n",
    "        num_examples_valid=X_valid.shape[0],\n",
    "        num_examples_test=X_test.shape[0],\n",
    "        input_height=X_train.shape[2],\n",
    "        input_width=X_train.shape[3],\n",
    "        output_dim=10,)\n",
    "data = load_data()\n",
    "\n",
    "idx = 0\n",
    "canvas = np.zeros((DIM*10, 10*DIM))\n",
    "for i in range(10):\n",
    "    for j in range(10):\n",
    "        canvas[i*DIM:(i+1)*DIM, j*DIM:(j+1)*DIM] = data['X_train'][idx].reshape((DIM, DIM))\n",
    "        idx += 1\n",
    "plt.figure(figsize=(10, 10))\n",
    "plt.imshow(canvas, cmap='gray')\n",
    "plt.title('Cluttered handwritten digits')\n",
    "plt.axis('off')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the model\n",
    "\n",
    "We use a model where the localization network is a two layer convolution network which operates directly on the image input. The output from the localization network is a 6 dimensional vector specifying the parameters in the affine transformation.\n",
    "\n",
    "We set up the transformer layer to initially do the identity transform, similarly to [1]. If the output from the localization networks is [t1, t2, t3, t4, t5, t6] then t1 and t5 determines zoom, t2 and t4 determines skewness, and t3 and t6 move the center position. By setting the initial values of the bias vector to \n",
    "\n",
    "```\n",
    "|1, 0, 0|\n",
    "|0, 1, 0|\n",
    "```\n",
    "and the final W of the localization network to all zeros we ensure that in the beginning of training the network works as a pooling layer. \n",
    "\n",
    "The output of the localization layer feeds into the transformer layer which applies the transformation to the image input. In our setup the transformer layer downsamples the input by a factor 3.\n",
    "\n",
    "Finally a 2 layer convolution layer and 2 fully connected layers calculates the output probabilities.\n",
    "\n",
    "\n",
    "### The model\n",
    "```\n",
    "Input -> localization_network -> TransformerLayer -> output_network -> predictions\n",
    "   |                                |\n",
    "   >--------------------------------^\n",
    "```\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "reset_default_graph()\n",
    "def build_model(x_pl, input_width, input_height, output_dim,\n",
    "                batch_size=BATCH_SIZE):\n",
    "    # Setting up placeholder, this is where your data enters the graph!\n",
    "    l_reshape = tf.transpose(x_pl, [0, 2, 3, 1]) # TensorFlow uses NHWC instead of NCHW\n",
    "\n",
    "    # make distributed representation of input image for localization network\n",
    "    loc_l1 = pool(l_reshape, kernel_size=[2, 2], scope=\"loc_l1\")\n",
    "    loc_l2 = conv(loc_l1, num_outputs=8, kernel_size=[5, 5], stride=[1, 1], padding=\"SAME\", scope=\"loc_l2\")\n",
    "    loc_l3 = pool(loc_l2, kernel_size=[2, 2], scope=\"loc_l3\")\n",
    "    loc_l4 = conv(loc_l3, num_outputs=8, kernel_size=[5, 5], stride=[1, 1], padding=\"SAME\", scope=\"loc_l4\")\n",
    "    loc_l4_flatten = flatten(loc_l4, scope=\"loc_l4_flatten\")\n",
    "    loc_l5 = dense(loc_l4_flatten, num_outputs=50, activation_fn=relu, scope=\"loc_l5\")\n",
    "    # set up weights for transformation (notice we always need 6 output neurons)\n",
    "    W_loc_out = tf.get_variable(\"W_loc_out\", [50, 6], initializer=tf.constant_initializer(0.0))\n",
    "    initial = np.array([[1, 0, 0], [0, 1, 0]])\n",
    "    initial = initial.astype('float32')\n",
    "    initial = initial.flatten()\n",
    "    b_loc_out = tf.Variable(initial_value=initial, name='b_loc_out')\n",
    "    loc_out = tf.matmul(loc_l5, W_loc_out) + b_loc_out\n",
    "\n",
    "    # spatial transformer\n",
    "    l_trans1 = transformer(l_reshape, loc_out, out_size=(DIM//3, DIM//3))\n",
    "    l_trans1.set_shape([None, DIM//3, DIM//3, 1])\n",
    "    l_trans1_valid = tf.transpose(l_trans1, [0, 2, 3, 1]) # Back into NCHW for validation\n",
    "\n",
    "    print \"Transformer network output shape: \", l_trans1.get_shape()\n",
    "\n",
    "    # classification network\n",
    "    class_l1 = conv(l_trans1, num_outputs=16, kernel_size=[3, 3], scope=\"class_l1\")\n",
    "    class_l2 = pool(class_l1, kernel_size=[2, 2], scope=\"class_l2\")\n",
    "    class_l3 = conv(class_l2, num_outputs=16, kernel_size=[3, 3], scope=\"class_l3\")\n",
    "    class_l4 = pool(class_l3, kernel_size=[2, 2], scope=\"class_l4\")\n",
    "    class_l4_flatten = flatten(class_l4, scope=\"class_l4_flatten\")\n",
    "    class_l5 = dense(class_l4_flatten, num_outputs=256, activation_fn=relu, scope=\"class_l5\")\n",
    "    l_out = dense(class_l5, num_outputs=output_dim, activation_fn=softmax, scope=\"l_out\")\n",
    "\n",
    "    return l_out, l_trans1_valid\n",
    "\n",
    "x_pl = tf.placeholder(tf.float32, [None, 1, DIM, DIM])\n",
    "model, l_transform = build_model(x_pl, DIM, DIM, NUM_CLASSES)\n",
    "#model_params = lasagne.layers.get_all_params(model, trainable=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# y_ is a placeholder variable taking on the value of the target batch.\n",
    "y_pl = tf.placeholder(tf.float32, shape=[None, NUM_CLASSES])\n",
    "lr_pl = tf.placeholder(tf.float32, shape=[])\n",
    "\n",
    "# computing cross entropy per sample\n",
    "cross_entropy = -tf.reduce_sum(y_pl * tf.log(model+1e-8), reduction_indices=[1])\n",
    "\n",
    "# averaging over samples\n",
    "cross_entropy = tf.reduce_mean(cross_entropy)\n",
    "\n",
    "# defining our optimizer\n",
    "optimizer = tf.train.AdamOptimizer(learning_rate=lr_pl)\n",
    "\n",
    "# applying the gradients\n",
    "train_op = optimizer.minimize(cross_entropy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# test the forward pass\n",
    "x = np.random.normal(0,1, (45, 1,60,60)).astype('float32') #dummy data\n",
    "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
    "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
    "# initialize the Session\n",
    "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
    "sess.run(tf.initialize_all_variables())\n",
    "res = sess.run(fetches=[model], feed_dict={x_pl: x})\n",
    "print \"y\", res[0].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training the model\n",
    "Unfortunately NVIDIA has yet to squeeze a TitanX into a labtop and training convnets on CPU is painfully slow. After 10 epochs you should see that model starts to zoom in on the digits. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def train_epoch(X, y, learning_rate):\n",
    "    num_samples = X.shape[0]\n",
    "    num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))\n",
    "    costs = []\n",
    "    correct = 0\n",
    "    for i in range(num_batches):\n",
    "        if i % 10 == 0:\n",
    "            print i,\n",
    "        idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))\n",
    "        X_batch_tr = X[idx]\n",
    "        y_batch_tr = y[idx]\n",
    "        fetches_tr = [train_op, cross_entropy, model]\n",
    "        feed_dict_tr = {x_pl: X_batch_tr, y_pl: onehot(y_batch_tr, NUM_CLASSES), lr_pl: learning_rate}\n",
    "        res = sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr)\n",
    "        cost_batch, output_train = tuple(res[1:3])\n",
    "        costs += [cost_batch]\n",
    "        preds = np.argmax(output_train, axis=-1)\n",
    "        correct += np.sum(y_batch_tr == preds)\n",
    "    print \"\"\n",
    "    return np.mean(costs), correct / float(num_samples)\n",
    "\n",
    "\n",
    "def eval_epoch(X, y):\n",
    "    num_samples = X.shape[0]\n",
    "    num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))\n",
    "    pred_list = []\n",
    "    transform_list = []\n",
    "    for i in range(num_batches):\n",
    "        if i % 10 == 0:\n",
    "            print i,\n",
    "        idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))\n",
    "        X_batch_val = X[idx]\n",
    "        fetches_val = [model, l_transform]\n",
    "        feed_dict_val = {x_pl: X_batch_val}\n",
    "        res = sess.run(fetches=fetches_val, feed_dict=feed_dict_val)\n",
    "        output_eval, transform_eval = tuple(res)\n",
    "        pred_list.append(output_eval)\n",
    "        transform_list.append(transform_eval)\n",
    "    transform_eval = np.concatenate(transform_list, axis=0)\n",
    "    preds = np.concatenate(pred_list, axis=0)\n",
    "    preds = np.argmax(preds, axis=-1)\n",
    "    acc = np.mean(preds == y)\n",
    "    print \"\"\n",
    "    return acc, transform_eval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "valid_accs, train_accs, test_accs = [], [], []\n",
    "learning_rate=0.0001\n",
    "try:\n",
    "    for n in range(NUM_EPOCHS):\n",
    "        print \"Epoch %d:\" % n\n",
    "        print 'train ',\n",
    "        train_cost, train_acc = train_epoch(data['X_train'], data['y_train'], learning_rate)\n",
    "        print 'valid ',\n",
    "        valid_acc, valid_trainsform = eval_epoch(data['X_valid'], data['y_valid'])\n",
    "        print 'test ',\n",
    "        test_acc, test_transform = eval_epoch(data['X_test'], data['y_test'])\n",
    "        valid_accs += [valid_acc]\n",
    "        test_accs += [test_acc]\n",
    "        train_accs += [train_acc]\n",
    "\n",
    "        # learning rate annealing\n",
    "        if (n+1) % 20 == 0:\n",
    "            learning_rate = learning_rate * 0.7\n",
    "            print \"New LR:\", learning_rate\n",
    "\n",
    "        print \"train cost {0:.2}, train acc {1:.2}, val acc {2:.2}, test acc {3:.2}\".format(\n",
    "                train_cost, train_acc, valid_acc, test_acc)\n",
    "except KeyboardInterrupt:\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot errors and zoom"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(9,9))\n",
    "plt.plot(1-np.array(train_accs), label='Training Error')\n",
    "plt.plot(1-np.array(valid_accs), label='Validation Error')\n",
    "plt.legend(fontsize=20)\n",
    "plt.xlabel('Epoch', fontsize=20)\n",
    "plt.ylabel('Error', fontsize=20)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(7,14))\n",
    "for i in range(3):\n",
    "    plt.subplot(321+i*2)\n",
    "    plt.imshow(data['X_test'][i].reshape(DIM, DIM), cmap='gray', interpolation='none')\n",
    "    if i == 0:\n",
    "        plt.title('Original 60x60', fontsize=20)\n",
    "    plt.axis('off')\n",
    "    plt.subplot(322+i*2)\n",
    "    plt.imshow(test_transform[i].reshape(DIM//3, DIM//3).T, cmap='gray', interpolation='none')\n",
    "    if i == 0:\n",
    "        plt.title('Transformed 20x20', fontsize=20)\n",
    "    plt.axis('off')\n",
    "    \n",
    "    \n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# A few pointers for image classification\n",
    "If you want do image classification using a pretrained model is often a good choice, especially if you have limited amounts of labeled data.\n",
    "\n",
    "An often used pretrained network is the Google Inception model. TensorFlow has a guide for using their current state-of-the-art pretrained model in their [model repository](https://github.com/tensorflow/models/tree/master/inception). Torch7 and Theano have similar pretrained models that you can find with google. \n",
    "\n",
    "Currently the best performing image network is the [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf) model. Torch7 has an interesting blog post about residual nets. http://torch.ch/blog/2016/02/04/resnets.html"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
